Embedding strategies for effective use of information from multiple sequence alignments.

نویسندگان

  • S Henikoff
  • J G Henikoff
چکیده

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quality assessment of multiple alignment programs.

A renewed interest in the multiple sequence alignment problem has given rise to several new algorithms. In contrast to traditional progressive methods, computationally expensive score optimization strategies are now predominantly employed. We systematically tested four methods (Poa, Dialign, T-Coffee and ClustalW) for the speed and quality of their alignments. As test sequences we used structur...

متن کامل

Molecular cloning of adenylate kinase from the human filarial parasite Onchocerca volvulus

Adenylate kinases (ADK) are ubiquitous enzymes that contribute to the homeostasis of adeninenucleotides in living cells. In this study, the cloning of a cDNA encoding an adenylate kinase from the filariaOnchocerca volvulus has been described. Using PCR technique, a 281 bp cDNA fragment encoding part ofan adenylate kinase was isolated from an O. volvulus cDNA library. Use of this fragment as a p...

متن کامل

Integration of Alignment and Phylogeny in the Whole-Genome Era

OF THE DISSERTATION Integration of Alignment and Phylogeny in the Whole-Genome Era by Hongtao Sun Doctor of Philosophy in Computer Science Washington University in St. Louis, 2015 Professor Jeremy Buhler, Chair With the development of new sequencing techniques, whole genomes of many species have become available. This huge amount of data gives rise to new opportunities and challenges. These new...

متن کامل

Adaptive BLASTing through the Sequence Dataspace: Theories on Protein Sequence Embedding

A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information now available. This problem is illustrated by the fact that most proteins lack comprehensive annotation, even when experimental evidence exists. We theorized that phylogenetic profiles provide a quantitative method that can relate the structural and functional prope...

متن کامل

PROMALS3D: a tool for multiple protein sequence and structure alignments

Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural informat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Protein science : a publication of the Protein Society

دوره 6 3  شماره 

صفحات  -

تاریخ انتشار 1997